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Introduction 

Today, college quality rankings in news magazines and guidebooks are a big busi- 
ness with tangible impacts on the operation of higher education institutions. The 
college rankings published annually by U.S. News and World Report (U.S. News) are 
so influential that Don Hossler of Indiana University derisively claims that higher 
education is the victim of “management” by the magazine. There is certainly support 
for such a claim: college rankings — particularly those of U.S. News — sell millions of 
copies when published, affect the admissions outcomes and pricing of colleges, and 
influence the matriculation decisions of high school students throughout the world . 1 

How did academic quality rankings of colleges and universities become so powerful 
in higher education? A review of their historical development in the first section of 
this study may surprise many readers. While college professors and administrators 
alike largely decry rankings today, their origin lies in academia itself. Begun as eso- 
teric studies by lone professors, college rankings’ development into the most popu- 
larly accepted assessment of academic quality was fueled by the very institutions of 
higher education they now judge. While the purpose and design of academic quality 
rankings has evolved during the century since their creation, their history teaches 
one clear lesson: college rankings fill a strong consumer demand for information 
about institutional quality, and as such, are here to stay for the foreseeable future. 

Various approaches to college rankings have different benefits and each is subject to 
legitimate criticism, all of which should be seriously considered in light of the power- 
ful effects that a widely-distributed ranking can have on institutions of higher edu- 
cation and the students seeking to enter them. Sections II and III will explore these 
aspects of college rankings, respectively. In light of the historical lessons revealed in 
Section I, however, movements that seek to reform college rankings should be fo- 
cused on producing better rankings, rather than on trying to eliminate or ignore 
them. Section IV will survey multiple new indicators of academic quality that many 
view as potential improvements over the indicators upon which current college rank- 
ings are based. 

The History of Academic Quality Ranking 

Many and various efforts have been made to assess the quality of higher education 
institutions. Accreditation agencies, guidebooks, stratification systems, and rank- 
ings all have something to say about the quality of a college or university but ex- 
press it in very different ways. For clarity, we will adopt higher education researcher 
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David Webster’s definition of “academic quality rankings.” For Webster, an academic 
quality ranking system has two components: 

1 . It must be arranged according to some criterion or set of criteria 
which the compiler(s) of the list believed measured or reflected aca- 
demic quality. 

2. It must be a list of the best colleges, universities, or departments 
in a field of study, in numerical order according to their supposed 
quality, with each school or department having its own individual 
rank, not just lumped together with other schools into a handful of 
quality classes, groups, or levels. 2 

All but one of the studies and publications discussed below will fit both criteria and 
so will qualify as “academic quality rankings.” 

Ranking systems that meet these two criteria can be further distinguished by their 
placement within three polarities. First, some rankings compare individual depart- 
ments, such as sociology or business, within a college or university, while others 
measure the quality of the institutions as a whole, without making special note of 
strong or weak areas of concentration. Second, rankings differ by whether they rank 
the quality of graduate or undergraduate education. The judging of graduate pro- 
grams and comparing of individual departments are often coupled together in a 
ranking system. This should come as little surprise considering the specialization of 
graduate-level education. Similarly, ranking undergraduate education usually, but 
not always, involves ranking whole institutions, probably due to the fact that a well- 
rounded education is often viewed as desirable at this level. 

More important than what rankings judge is how they do the judging. Most aca- 
demic quality rankings to this point have used one of two primary strategies for de- 
termining quality: outcomes-based assessment or reputational surveys, although 
other objective input and output data such as financial resources, incoming student 
test scores, graduation rates, and so forth have often been used to supplement these 
primary measures. Rankings that look at college outcomes are often concerned with 
approximating the “value-added” of a college or university. They use data about stu- 
dents’ post-graduate success, however defined, to determine the quality of higher 
education institutions and have often relied on reference works about eminent per- 
sons such as Who’s Who in America. Reputational rankings are those which are sig- 
nificantly based on surveys distributed to raters who are asked to list the top de- 
partments or institutions in their field or peer group. 
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Either form of academic quality rankings — outcomes-based or reputational — can be 
used in departmental or institutional rankings and graduate or undergraduate rank- 
ings. In fact, there have been two major periods in which each method of ranking 
was ascendant: outcomes-based rankings, derived from studies of eminent gradu- 
ates, were published in great number from 1910 to the 1950s, while reputational 
rankings became the norm starting in 1958 and continuing to the present. 3 While 
there has been some renewed interest in outcomes-based rankings recently, they 
have yet to regain parity with reputational rankings in terms of popularity. The rest 
of this section will examine a number of major academic quality rankings through- 
out history, and explore their development from esoteric studies into one of the most 
powerful forces in higher education. 

Early Outcomes-Based Rankings 


The first college rankings developed in the United States out of a European preoccu- 
pation — especially in England, France, and Germany — with the origins of eminent 
members of society. European psychologists studied where eminent people had been 
born, raised up, and attended school in an attempt to solve the question of whether 
great men were the product of their environment (especially their university) or were 
simply predestined to greatness by their own heredity. In 1900, Alick Maclean, an 
Englishman, published the first academic origins study entitled Where We Get Our 
Best Men. Although he studied other characteristics of the men, such as nationality, 
birthplace, and family, at the end of the book he published a list of universities 
ranked in order by the absolute number of eminent men who had attended them. In 
1904, another Englishman, Havelock Ellis — a hereditarian in the ongoing nature 
versus nurture debate — compiled a list of universities in the order of how many 
“geniuses” had attended them. 4 

In each case, neither author explicitly suggested the use of such rankings as a tool 
for the measurement of the universities’ quality. Although there seems to be an im- 
plicit quality judgment in simply ranking universities according to their number of 
eminent alumni, the European authors never made the determination of academic 
quality an explicit goal. However, when Americans began producing their rankings 
with this very aim, they used similar methodologies and data. Many of the earliest 
academic quality rankings in the United States used undergraduate origins, doctoral 
origins, and current affiliation of eminent American men in order to judge the 
strengths of universities. 5 


The first of these rankings was published by James McKeen Cattell, a distinguished 
psychologist who had long had an interest in the study of eminent men. In 1906, he 
published American Men of Science: A Biographical Dictionary, a compilation of short 
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biographies of four thousand men that Cattell considered to be accomplished scien- 
tists, including where they had earned their degrees, what honors they had earned, 
and where they had been employed. He “starred” the thousand most distinguished 
men with an asterisk next to their biography. In the 1910 edition of American Men of 
Science, Cattell updated the “starred” scientists and aggregated the data about 
which institutions these men had attended and where they taught at the time, giving 
greater weight to the most eminent than to the least. He then listed the data in a ta- 
ble with the colleges in order of the ratio of this weighted score to their total number 
of faculty, thereby creating the first published academic quality ranking of American 
universities. 6 

Cattell understood that he was making a judgment about these institutions’ quality 
as evidenced by his titling the table “Scientific Strength of the Leading Institutions,” 
and his claim that “[t]hese figures represent with tolerable accuracy the strength of 
each institution.” Cattell was also aware that prospective students would be inter- 
ested in the judgments of quality. He wrote, “Students should certainly use every ef- 
fort to attend institutions having large proportions of men of distinction among their 
instructors.” Furthermore, Cattell claimed that the “figures on the table appear to be 
significant and important, and it would be well if they could be brought to the atten- 
tion of those responsible for the conduct of the institutions,” implying a belief that 
the rankings represented a judgment of quality that could be improved over time if 
the institutions took the correct actions. 7 

Although Cattell’s first study was not based purely on the measured outcomes of the 
institutions he ranked, it was central to the development of later outcomes-based 
rankings. Cattell himself would continue to publish similar studies in which he 
judged institutions of higher education based on the number of different eminent 
people — not just scientists — they both produced and employed, without ever funda- 
mentally altering his methodology. From Cattell’s 1910 study until the early 1960s, 
the quality of institutions of higher education would be most frequently judged using 
this method of tracking the educational background of distinguished persons. 8 

One researcher who was greatly influenced by Cattell’s work, but who even more ex- 
plicitly dealt with the quality of academic programs, was a geographer from Indiana 
University named Stephen Visher. Interested in why geographical areas demon- 
strated a disparity in the number of scientific “notables” they produced, Visher 
looked at the undergraduate education of the 327 youngest “starred” scientists in 
Cattell’s 1921 edition of American Men of Science. Such an approach tested the hy- 
pothesis that the disparities resulted because “the leaders come from those who are 
greatly stimulated in colleges.” He ranked the top seventeen institutions by the ratio 
of the young “starred” scientists to total student enrollment, thereby creating the 
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first enrollment-adjusted outcomes-based ranking. Visher suggested that the rank 
demonstrated the “comparative success of these institutions in inspiring under- 
graduate students,” and argued that “[t]he conspicuous contrasts... in the number 
and percentage of graduates who later become leaders suggest that there are 
marked variations in the stimulating value of institutions.” 9 

Beverly Waugh Kunkel, a biologist at Lafayette College, and his co-author Donald B. 
Prentice, then president of the Rose-Hulman Institute of Technology, repeatedly 
used a methodology similar to that of Cattell and Visher, but stated their interest in 
the academic quality of universities even more explicitly. In their first study, pub- 
lished in 1930, Prentice and Kunkel expressed interest in “what elements constitute 
a successful institution,” especially in light of the large investments that individuals 
were making in their educations. The authors believed that “undoubtedly the most 
reliable measure” of a higher education institution was “the quality of product.” 
Therefore, Prentice and Kunkel measured academic quality by the number of a col- 
lege’s undergraduate alumni listed in Who’s Who in America. 10 

Kunkel and Prentice repeated essentially the same methodology in periodical studies 
from 1930 to 1951. They ranked schools according to the number of baccalaureate- 
earning alumni who were listed in Who’s Who. 11 In the 1930 study, the authors pro- 
vided a table ranking the schools by absolute number of graduates listed and a sec- 
ond table ranking them according to the percentage of a school’s living alumni who 
were listed. The authors noted that an overrepresentation of ministers and college 
professors and an underrepresentation of engineers in Who’s Who likely skewed the 
results of their rankings. In the 1951 study, the authors listed the schools alpha- 
betically with the absolute number of alumni listings and their numerical rank. This 
later study did not include a percentage based ranking, but instead focused on the 
time period from which the listed alumni graduated, hoping that this might be of 
use in identifying good practices for those familiar with an institution’s historical op- 
erations. 12 

One final early study that deserves mention is the first and last attempt by the fed- 
eral government to explicitly compare academic quality among institutions. In 1910, 
the American Association of Universities (AAU) asked Kendric Charles Babcock, 
Higher Education specialist in the Bureau of Education, to publish a study of the 
undergraduate training at colleges so that graduate schools would be able to know 
which applicants were best prepared. The Bureau of Education was chosen because 
it was believed by AAU that the rankings would be more widely accepted if they were 
compiled by an impartial source without a connection to a university. 
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Babcock’s study was a stratification and not a ranking. When finished, he divided 
344 institutions into four different classes rather than supplying an individual rank 
to each school. As with most of the early studies mentioned above, Babcock meas- 
ured quality based on the outcomes an institution produced — here, the performance 
of schools’ graduates after they entered graduate school — but he was not greatly in- 
fluence by Cattell’s quantitative, eminent person methodology. 13 On visits to several 
graduate schools, Babcock “conferred with deans, presidents, and committees on 
graduate study,” and “inspected the credentials and records of several thousands of 
graduate students... in order to ascertain how such students stood the test of trans- 
planting.” 14 

The accidental release of a draft of the study to the newspapers resulted in such a 
furor from the deans and presidents of colleges classified lower in the rankings that 
President Taft issued an executive order prohibiting the study’s official release. The 
Commissioner of Education, P.P. Claxton, tried to soothe the disgruntled by admit- 
ting that the classification was “imperfect” because its single criterion of graduate 
school performance failed to account for the fact that many colleges may perform 
very highly in their provision of services to those students who do not go on to 
graduate school. Neither Claxton’s explanations nor the praise the classification re- 
ceived from some deans and presidents (mostly from class I schools), were enough to 
convince President Wilson from rescinding Taft’s order when asked to do so by AAU 
upon his arrival in the White House. 15 This historic episode demonstrates one rea- 
son why the federal government has never attempted to rank or in any way judge 
the comparative academic quality of higher education institutions since. 

The Rise of Reputational Rankings 

Reputational surveys would become the predominant method for producing aca- 
demic quality rankings beginning in 1959, with the most popular ranking today, one 
by U.S. News and World Report, containing a strong component of reputational 
evaluation. However, this methodology was developed much earlier, in 1924, by Ray- 
mond Hughes, a chemistry professor at Miami University in Ohio. When asked by 
the North Central Association of Schools and Colleges to complete a study about 
graduate school quality, Hughes would turn to the opinion of his fellow faculty in- 
stead of relying on the then popular outcome-based methodology. 16 

Hughes circulated two requests to Miami University faculty in twenty fields of 
study — nineteen liberal arts disciplines and the professional discipline of education. 
The first sought from each faculty member a list of forty to sixty instructors who 
taught their discipline in American colleges and universities. The second request 
asked the recipients to rate, on a scale of one to five, the departments of thirty-six 
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institutions that offered a degree in their discipline, so as to create “a list of the uni- 
versities which conceivably might be doing high grade work leading to a doctor’s de- 
gree.” 17 

Hughes received about a 50 percent response rate. After weighting the ratings, 
Hughes produced a table that listed the departments according to how many 1, 2, 3, 
4 or 5 ratings they had received. Although he did not calculate an overall score for 
each department, the ordered list, based on a specific criterion, meets the definition 
of an academic quality rating, the first such list determined by a department’s repu- 
tation among selected raters. Hughes also did not aggregate the ranks of the depart- 
ments to form an institution-wide ranking. 

During his chairmanship of the American Council on Education, Hughes would 
publish another study in 1934 on graduate school quality of much wider scope. 
Hughes’s second study was an improvement over the first in many respects. First, 
the 1934 study covered thirty- five disciplines as opposed to the twenty disciplines in 
his earlier study. The second study also gathered opinions from a more diverse field 
of respondents; to compile his list of raters, Hughes asked the secretary of each dis- 
cipline’s national society for a list of one hundred scholars that would fully represent 
the field and its sub-fields, resulting in a greater number of respondents for each 
discipline. However, while the 1934 study helped to refine the reputational method- 
ology, it was not a ranking. Instead of listing the departments in order of their rat- 
ing, Hughes simply listed alphabetically any department that at least half of the rat- 
ers had judged as adequate. 18 

Though developed during the same period as other outcomes-based rankings, the 
reputational methodology of judging academic quality was largely absent for twenty- 
five years after Hughes’s 1934 study. It would reappear in the appendix of Graduate 
Study and Research in the Arts and Sciences at the University of Pennsylvania, pub- 
lished by a humanities professor, Hayward Keniston. The ranking was compiled in 
connection with work he was doing for the University of Pennsylvania in 1959 to 
help compare it to other American research universities. Although Keniston’s rank- 
ing did not gather much attention beyond the walls of his institution, its publication 
nonetheless marks the beginning of the decline of outcomes-based rankings and the 
rise of reputation-based rankings, a shift that would be complete a decade later. 

The ranking in Keniston’s study relied solely on the opinions of twenty-four depart- 
ment chairpersons at each of twenty-five top universities. The universities from 
which the raters came were chosen based on their membership in the Association of 
American Universities, the number of doctorates granted, and their geographical dis- 
tribution. Keniston was only interested in comprehensive research universities com- 
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parable to the University of Pennsylvania, so schools such as Massachusetts Insti- 
tute of Technology and California Institute of Technology were not included due to 
their technical nature, and Michigan State and Penn State were not included be- 
cause of their limited Ph.D. programs. 19 

Once Keniston identified the raters, he asked them to rank the fifteen strongest de- 
partments in their discipline at the twenty-five universities to which he had sent 
surveys. After an 80 percent response rate, resulting in about twenty different rank- 
ings per discipline, Keniston aggregated the rankings of departments into the four 
broad categories of humanities, social sciences, biological sciences and physical sci- 
ences. He also aggregated the disciplinary ratings into institution-wide rankings, 
making his study the first institution-wide ranking determined through reputational 
surveys. It should be noted that Keniston’s choice of which disciplines to include 
seems to have been influenced by a desire to improve the University of Pennsyl- 
vania’s position in the rankings. Eleven of the twenty- four came from the humani- 
ties, including Oriental studies and Slavic studies, in which the University of Penn- 
sylvania ranked eighth and sixth respectively — the university’s two highest ranks 
overall — and did not include engineering, one of the university’s less prestigious de- 
partments. 20 

From 1959 to 1966, the reputational methodology quietly gained ground in the 
world of academic quality rankings. After Keniston, five reputational rankings were 
completed, one unpublished, but none of which received any special attention. An 
Australian geographer published a ranking of American geography departments in 
an Australian journal in 1961. Albert Somit and Joseph Tanehaus published a book 
-length study of American political science departments in which they ranked the 
top thirty- three graduate departments in 1964. In 1966, Sam Sieber (with the col- 
laboration of Paul Lararsfeld) ranked departments of education according to raters’ 
views on their research value, subscribers to Journal of Broadcasting ranked broad- 
casting graduate programs, and Clark Kerr, then president of the University of Cali- 
fornia, created an unpublished ranking of medical schools affiliated with universities 
who belonged to AAU. During this time, however, outcomes-based rankings had not 
yet disappeared; the well-known psychologist Robert Knapp was still publishing 
rankings based on academic origins up until 1964. 21 

The ascendancy of reputational rankings can be said to have truly started with the 
methodological advances of Allan Cartter, who published the 1966 Assessment of 
Quality in Graduate Education (Cartter Report). The Cartter Report ranked twenty- 
nine disciplines, similar to Hughes and Keniston’s studies, but it was an improve- 
ment over these reputational rankings in several significant ways. First, it polled 
senior scholars and junior scholars in addition to department chairpersons, result- 
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ing in almost 140 rankings per discipline, providing a more diverse and larger body 
of opinions than both previous rankings. Second, Cartter had raters rank the disci- 
plines at 106 different institutions, more than any previous reputational ranking. 
Finally, the Cartter Report ranked the departments according to the two criteria of 
“quality of the graduate faculty” and “rating of the doctoral training program,” 22 in- 
stead of just one criterion, as in both Hughes studies and in Keniston’s ranking. 

The respondents rated each department on a scale of one to five. In addition to 
ranking the departments, Cartter stressed his interest in quality by providing labels 
based on their scores. All those ranked 4.01 and up were labeled “distinguished” 
and those ranked 3.01 to 4.00 were labeled “strong.” For departments scoring from 
2.01 to 3.01, Cartter listed them alphabetically, split in the middle between those 
labeled “good” and “adequate plus.” Although Cartter did not aggregate his depart- 
mental ratings into institution-wide rankings, three other authors performed the 
task with his data after its publication. Cartter also provided more analysis of his 
own rankings than any previous reputational study, including geographical distribu- 
tion of highest ranked departments, relationship between ranking and faculty com- 
pensation, and the correlation between faculty publications and their score for 
“quality of graduate faculty.” 23 

Cartter’s ranking not only had the most comprehensive methodology to date, it also 
enjoyed the best critical reception. It received mass attention by earning more re- 
views than any previous reputational ranking study, most of which were positive. In 
addition to praise from higher education officials, magazines such as Time and Sci- 
ence lauded the assessment provided by the study. Once published, the report sold 
approximately 26,000 copies. 24 This commercial success and critical acclaim can be 
understood as one of the prime reasons reputational rankings became the over- 
whelming norm after 1966. 

In 1970, Kenneth Roose and Charles Andersen sought to replicate Cartter’s study, 
although with a self-admitted goal of de-emphasizing the “pecking-order” of the first 
ranking. Roose and Andersen’s A Rating of Graduate Programs rounded depart- 
ments’ ratings to one decimal rather than two, resulting in more ties. Yet only the 
rank, and not these scores, were published. There were no descriptive labels as- 
signed to a program’s score and the Roose /Andersen study only provided the ordinal 
rank of the departments based on their “faculty quality” score. For the “program ef- 
fectiveness” score, Roose and Andersen simply listed departments in order of their 
scores without including an ordinal position. 25 

Additionally, the Roose / Andersen study expanded the number of disciplines in- 
cluded to thirty-six, the number of institutions at which these disciplines were 
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ranked to 130, and the number of usable responses to approximately 6,100 (from 
4,000 in the Cartter Report) . Like Cartter’s study, Roose and Andersen did not aggre- 
gate the departmental ratings into institution-wide rankings, but publications such 
as Newsweek and the Journal of Higher Education did. Despite this expanded scope, 
decreased importance of the “pecking-order,” and significant press coverage, the 
Roose / Andersen study did not receive the same reception as did the Cartter Report 
four years earlier. Indeed, it received much criticism from academics, one of whom 
complained that the rankings did not reflect recent increases in quality at newer and 
developing universities. 26 

When published in 1982, the Assessment of Research-Doctorate Programs in the 
United States (Assessment), produced by the National Academy of Sciences in con- 
junction with the National Research Council, was the largest academic quality rank- 
ing project ever undertaken. The Assessment rated a total of 2,699 programs at 228 
different institutions and provided detailed data about hundreds of these programs. 
The entire study cost more than $1.1 million in 2008 dollars before it was com- 
pleted. Unfortunately, the large amounts of information collected were not presented 
in an easily understandable fashion. For each measure of quality, the programs 
were listed alphabetically by institution with their standardized scores listed next to 
them. Since several dozen programs were listed for most disciplines, each rank was 
difficult to determine. Furthermore, no attempt was made by the authors of the 
study to aggregate these scores into institution-wide rankings. 27 

In addition to being the largest, the Assessment was also the first major reputational 
study to include non-reputational measures as well. Of the sixteen measures used, 
four were reputational, while the others covered areas such as program size, charac- 
teristics of graduates, library size, research support and publication records. How- 
ever, the reputational measures were the most widely reported in news outlets, from 
the Chronicle of Higher Education to the Washington Post. Of twelve scholarly works 
about the NAS Assessment, all mentioned one or more reputational measure while 
only three discussed any of the other twelve measures. All of the twenty-nine news 
articles covering the study reported the ranks of programs according to at least one 
of the reputational measures but only seven reported the rankings according to any 
non-reputational measure. 28 

A follow-up study produced by the National Research Council in 1995 studied much 
of the same data as the Assessment, but with an increased scope. The Research- 
Doctorate Programs in the United States: Continuity and Change ranked forty-one dis- 
ciplines at 274 institutions, resulting in a total of 3,634 programs, costing $1.7 mil- 
lion in 2008 dollars. The 1995 report improved on the reporting method of the ear- 
lier Assessment by publishing the names of institutions in rank order for each disci- 
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pline, although it did not provide the numerical rank in the tables. It used twenty 
measures of quality: three reputational, eight related to faculty research, two related 
to doctoral students and seven related to Ph.D. recipients. Once again, although 
many different quantitative measures were reported in the study, the rankings 
based on the reputational measurements received the most attention. 29 The next 
version of this graduate program ranking is expected to be officially released by the 
National Research Council shortly. 30 

Re-Emergence of Undergraduate Rankings 

The rise of reputational rankings starting in 1959 coincided with a decreased inter- 
est in ranking both institutions and programs for undergraduate study. Although 
undergraduate study was the primary concern of many of the early rankings, such 
as the unpublished Bureau of Education stratification and the outcomes-based, 
academic origins studies of Visher and Kunkel and Prentice, all of the reputational 
studies mentioned above focused on graduate-level programs. However, just as the 
first reputational studies were published during the ascendancy of outcomes-based 
rankings, a few reputational undergraduate rankings were published during the 
dominance of reputational graduate studies from the late 1950s to the early 1980s. 

The first undergraduate ranking compiled through a reputational methodology was 
a little known study published in 1957 by journalist Chesly Manly in the Chicago 
Sunday Tribune. In an article, Manly ranked the top ten universities, coeducational 
colleges, men’s colleges, and women’s colleges according to their undergraduate 
quality, in addition to the top law and engineering graduate schools. The ranking 
was also the first with a reputational component to rate whole institutions rather 
than just departments or disciplines. Although the methodology was never fully dis- 
closed, Manly claimed to have based the rankings on the opinions of a few dozen 
“consultants” in addition to a “great mass” of quantitative data. 31 

Another ranking that focused on both undergraduate and graduate rankings is The 
Gourman Report, authored by Jack Gourman. These reports began evaluating col- 
leges in 1955 and Gourman began publishing the results in 1967, the latest of 
which appeared in 1997. Although the ranking offers a rating to two decimal places 
for each school included, the methodology used is almost completely unknown. 
Gourman has said that the final score is calculated by averaging scores on ten fac- 
tors, which have more to do with administration policies than with any typical meas- 
ure of reputation or student success. He has never revealed exactly what data were 
gathered, who and how they gathered it, or how it was compiled into the precise fi- 
nal score. Additionally, experts have claimed that the many one-hundredths of a 
point differences between schools with no large gaps is almost statistically impossi- 
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ble. Regardless of the opacity and questionable results, however, the Gourman rat- 
ings have historically been used in many scholarly reports, especially by economists 
interested in the relationships between college quality and other variables such as 
alumni earnings and student choice. 32 

The first post- 1959 undergraduate ranking developed with methodological rigor was 
a pilot study by Lewis Solmon and Alexander Astin, published in 1981. They pro- 
vided raters from four states — California, Illinois, North Carolina, and New York — a 
list of 80 to 150 departments in each of seven different fields: biology, business, 
chemistry, economics, English, history, and sociology. They asked the raters to 
choose the best quality department from this list in their field and, in addition, to 
judge each department on six criteria: overall quality of undergraduate education, 
preparation for graduate school, preparation for employment, faculty commitment to 
teaching, scholarly accomplishments of faculty, and innovativeness of curriculum. 33 

Solmon and Astin found that a rater’s overall opinion of an undergraduate depart- 
ment was highly correlated with his or her opinion about the department’s 
“scholarly excellence of faculty” and “commitment to undergraduate teaching.” The 
first criterion was also highly correlated to whether the institution had a strongly-rated graduate pro- 
gram, which resulted in the top undergraduate programs appearing largely as a reflection of Cartter 
and Roose/ Andersen's studies of top graduate programs. Since Solmon and Astin were inter- 
ested in providing a new list of undergraduate quality, rather than simply repeating 
the same schools as those earlier studies, they excluded any institution that had re- 
ceived a high rating in the Roose / Andersen study from the results they reported in 
Change magazine. With those institutions left, they ranked departments in each 
field according to how many times each ranked in the top ten of the six criteria. In a 
second table, they listed institutions by how many of their departments had been 
listed in the top ten in their respective disciplines. One interesting result was that 
some typically highly-regarded colleges were ranked among the top in fewer than five 
of the disciplines. 34 

Rankings in Popular Publications 

The face of academic quality rankings would be revolutionized with the undergradu- 
ate reputational ranking first published in 1983 by U.S. News and World Report. 

Even though authors of academic quality rankings from as early as Cattell — the very 
first — have noted the interest that prospective students might have in knowing the 
ranks of different institutions and departments, rankings would not play a large role 
in helping high school students choose what college to attend until U.S. News began 
publishing “America’s Best Colleges.” Until then, academic quality rankings were the 
province of professors and higher education administrators. Published as studies by 
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researchers in little-circulated academic books and journals, few college-bound stu- 
dents would have been able to find them, and the rankings were often too obscure to 
be very helpful even if located. Compiled by editors and published in a highly- 
circulated news magazine, the U.S. News rankings became the most widely read and 
more influential than any ranking that had come before them. 35 

The first three U.S. News rankings were entirely based on reputational surveys. The 
schools were broken into categories according to their Carnegie classification, and 
college presidents were asked to name the top undergraduate institutions. In 1983, 
1308 presidents were sent surveys and U.S. News received about a 50 percent re- 
sponse rate. The ranking was repeated two years later in 1985. The raters were once 
again college presidents, but in this ranking they were asked to select the top five 
undergraduate schools from a provided list of institutions similar to their own. For 
the third ranking in 1987, the surveyed presidents would be asked to rate the top 
ten institutions. By the third ranking, the response rate had increased to 60 per- 
cent, but the criticism from academia had gained momentum as well. A number of 
the presidents who had refused to respond argued that neither they nor their fellow 
presidents had the ability to judge the academic quality of any institution except 
their own. 36 

The rankings in 1988 departed from the first three rankings in response to the in- 
creased criticism. Reportedly in consultation with college presidents and academic 
experts, U.S. News made two major methodological changes. First, they surveyed the 
opinions of academic deans and admissions officers in addition to those of college 
presidents, arguing that this would more adequately cover differing conceptions of 
quality. Second, they reduced the reputational component to just 25 percent of the 
overall ranking and determined the remaining 75 percent of a school’s score using 
objective input and output data such as admissions selectivity, faculty strength, 
educational resources, and graduation rates. 37 Additionally, 1988 marked the begin- 
ning of the annual publishing of the ranking, as well as the year that U.S. News be- 
gan publishing a book length college guide, American’s Best Colleges, which in- 
cluded further in-depth information about the schools included in the rankings. 38 

Over the past two decades, the methodology for the U.S. News undergraduate rank- 
ings has been through numerous incremental adjustments. In 1995, respondents 
were asked to consider the quality of undergraduate teaching in their ratings and 
the weight placed on the graduation rate was increased, due to a greater concern for 
measuring outcomes. 39 This interest in outcomes-based measurements further de- 
veloped into what U.S. News called a “value-added” criterion in 1996, in which the 
ranking used a model to predict a school’s expected graduation rate, using input 
factors such as test scores of incoming students, and then compared the predicted 
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to its actual graduation rate. The higher the actual rate was above the predicted 
rate, the higher the school’s rank. 40 

In 1999, U.S. News began to standardize the data used in the calculation of the 
rankings in order to bring their process more in line with accepted statistical prac- 
tices. This allowed the outcomes to reflect the size of differences between schools in 
each component of the rankings, rather than simply the respective ranks. In 2000, 
they further tweaked their calculations by adjusting for the ratio of graduate stu- 
dents to undergraduate students at each school in order to eliminate a bias towards 
schools that spend large amounts of research money that primarily benefit graduate 
students rather than undergraduates. Related to methodological changes, the reli- 
ance on the Carnegie classifications to break schools into ranking categories has re- 
sulted in seemingly drastic changes for some schools from one year to the next. For 
example, Carnegie has updated their classifications twice since 2000. This means 
that schools can suddenly appear or vanish in one ranking category, even though 
there has been no significant change in their characteristics. 

Some observers have attacked the magazine for these constant changes in method- 
ology, arguing that they produce changes in some schools’ ranks that are only re- 
flective of the adjustments in the ranking’s weighting rather than any true change in 
quality at the institution. However, college rankings scholar David Webster has ap- 
plauded U.S. News for its constant tinkering as a demonstration of their receptive- 
ness to criticism and constant striving for improvement. Indeed, in 1992, Webster 
considered U.S. News’s undergraduate rankings “by far the best such rankings ever 
published.” 41 An additional criticism, however, is that there have been many reports 
over the years of schools deliberately “fudging” their data or taking non-quality re- 
lated steps in order to increase their U.S. News rank. 42 

Two smaller but still relatively widely-circulated rankings deserve mention at this 
point. Shortly after U.S. News began publishing their undergraduate ranking annu- 
ally, Money released its first annual “America’s Best College Buys” in 1990. Rather 
than judging schools only by their quality, however, Money sought to rank schools 
according to their value — that is, the amount of quality per dollar of tuition spent. 
Money used statistical analysis to determine how much a college should be expected 
to cost based on a number of factors: test scores and class ranks of incoming stu- 
dents, faculty resources and quality, library resources, graduation and retention 
rates, and academic and career success of alumni. The schools were then ranked 
according to how far their actual “sticker price” tuition is below their predicted tui- 
tion. Beyond this, little more is known about Money’s methodology. One interesting 
result of ranking by value, however, was that Money’s top schools were highly di- 
verse in their academic quality, price, and public/ private affiliation. 43 
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In The Best 368 Colleges, The Princeton Review publishes numerous top-twenty 
rankings of various categories based on surveys completed by students at the 368 
institutions profiled in the book. The edition published in 2008 contained sixty-one 
different rankings in the eight overall categories of academics, politics, demograph- 
ics, quality of life, parties, extracurricular activities, schools by type, and social life. 
The rankings are determined by an eighty-one-question survey that students fill out 
about their own campuses. The respondents are nearly or completely self-selected. 
According to the Princeton Review website, in recent years 95 percent of the surveys 
have been filled out electronically by those students who sign up online, and they 
can be completed “anytime, anywhere.” While clearly the least methodologically rig- 
orous of their popular counterparts, the Princeton Review’s rankings nonetheless 
earn significant media attention, and with the name of a major college-prep com- 
pany behind them, almost certainly influence high school students’ enrollment deci- 
sions. 44 

The most recent academic quality ranking published by a popular news magazine 
was in a few respects a return to origins. Although the 2008 study was released on 
the website of Forbes magazine, it was largely developed by an academic — Richard 
Vedder, an economist at Ohio University and the director of the Center for College 
Affordability and Productivity. 45 In contrast to the input-heavy U.S. News ranking, 
Vedder’s ranking was largely based on academic outcomes. Measurements such as 
the enrollment-adjusted number of alumni listed in Who’s Who in America and the 
number of faculty winning national and international awards reflect the method of 
some of the very earliest academic quality rankings, such as Cattell, and Kunkel and 
Prentice. Additionally, the ranking measured student opinions of their professors as 
self-reported on Ratemyprofessors.com, the amount of debt students held at gradua- 
tion, the colleges’ four-year graduation rate, and the number of students winning 
nationally competitive awards. According to the article accompanying the rankings, 
Vedder’s aim for the Forbes/ CCAP ranking was to judge colleges from a student’s 
perspective. The ranking’s methodology was designed to account for students’ pri- 
mary concerns, such as the quality of undergraduate teaching and the outcomes of 
an education in terms of graduation, debt, and post-graduation career success. 46 

Popular news magazines have not limited themselves to ranking undergraduate edu- 
cation. Only a few years after publishing their first undergraduate ranking, U.S. 
News and World Report produced their first ranking of top professional schools in 
1987. This first edition ranked medical, law, engineering, and business graduate 
programs based completely on the opinions collected from surveys sent to deans of 
departments in those same fields. 47 Similar to the way their undergraduate rankings 
developed, the graduate and professional school rankings by U.S. News have ex- 
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panded and incorporated more objective data over the years. The most recent edition 
ranks institutions in twenty- two different professional and graduate fields. The 
rankings of the five professional fields (the original four fields plus education) each 
have a unique methodology, but they all include some measure of reputation among 
peers and employers and a measure of selectivity (in the form of schools’ acceptance 
rates and entering students’ test scores, GPA, etc.). Other factors used include fac- 
ulty resources, job placement rates, and research strengths, though none of these 
factors is used in all five fields. The rankings of the other seventeen graduate disci- 
plines — which can be grouped into six umbrella fields of sciences, library studies, 
social sciences and humanities, health, public affairs, and fine arts — are still com- 
pletely determined by reputational surveys of department heads and directors of 
graduate study in the relevant discipline. 48 

U.S. News appears to enjoy the same dominance in the graduate school ranking 
business that they have in the undergraduate rankings. Although some individuals 
and professional organizations produce alternative rankings — such as law professor 
Brian Leiter’s Law School Rankings — no other regular large-circulation publication 
provides rankings covering the number of graduate and professional programs in- 
cluded by U.S. News. 49 The graduate school rankings produced by the National Re- 
search Council are by far the most comprehensive of such rankings and perhaps re- 
ceive the most attention in the academic community, but their irregular publication 
(1982, 1995, and forthcoming) and convoluted organization of information (see 
above discussion) have limited their popular influence. 

The one exception to U.S. News’s supremacy is in the field of business. Numerous 
major publications dealing with finance or business produce their own rankings of 
business schools or MBA programs. Business Week, Forbes, and the Wall Street 
Journal publish national rankings of U.S. business schools or MBA programs in ad- 
dition to U.S. News. The Economist and Financial Times both publish international 
rankings of MBA programs. Although each ranking uses different methodologies and 
components, all measure graduates’ salaries and/or post-graduation career develop- 
ment. 50 As such, the correlations between the most recent MBA rankings are rela- 
tively high, ranging from 0.65 to 0.85 when comparing only those schools that are 
listed in each ranking (excluding the Wall Street Journal; see Table l). 51 

Global Rankings 


A major development in academic quality rankings of higher education is the recent 
move to global comparisons. In just the past five years, two rankings have emerged 
that claim to list the best colleges in the world. The Shanghai Jiao Tong University’s 
“Academic Rankings of World Universities” (ARWU), first published in 2003, pio- 
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neered this new frontier. Developed by the university in order to compare China’s 
rapidly growing system of higher education to its international competitors, it has 
been adopted by many other nations for the same purpose, particularly after its 
publication in such international magazines as The Economist. The rankings are de- 
termined by the number of alumni and staff who have won Nobel Prizes or Fields 
Medals over the past 100 years (10% and 20%, respectively), the number of highly- 
cited researchers employed by a university (20%), the number of articles published 
in Nature and Science (20%), and the number of articles in the Science Citation In- 
dex-Expanded and the Social Science Citation Index (20%). A final component ad- 
justs this research output according to the size of the institution (10%). 52 Although 
these measures include labels such as “Quality of Education” and “Quality of Fac- 
ulty,” the ranking essentially measures a university’s research prowess and has little 
to say about other educational measures. 

The Times Higher Education of England quickly developed its own global ranking 
system after the appearance of ARWU. Published annually since 2005, the Times 
ranking uses a mixture of reputation, research output, and other quantitative input 
data to determine the top schools. Peer review surveys count for 40 percent of a 
school’s score with another 10 percent based on surveys of employers. The remain- 
ing half of the ranking is determined by the percentage of international staff (5%) 
and students (5%), the number of research citations per staff member (20%) and the 
student-to-staff ratio (20%). 53 The article accompanying the 2008 rankings noted 
that the top universities can generally be characterized as English-speaking 
(predominately American and British schools, with some from Canada and Austra- 
lia) and as independent from government control (although heavily reliant on gov- 
ernment funding). The Times nonetheless defends its ranking’s international charac- 
ter by noting that the top two hundred schools are located in twenty-eight different 
countries. 54 Others find it easy to argue the existence of a bias in favor of schools lo- 
cated in the U.K. and its former colonies. 55 Regardless, the Times rankings receive 
wide coverage in foreign press. A sure indication of their continued growth in influ- 
ence came in 2008 when rankings guru U.S. News and World Report published the 
Times results under its own banner in the United States. 56 

Rankings at the global level have emerged for reasons similar to those behind their 
initial development at the national level. Some of the first national rankings in the 
U.S. were driven by graduate institutions’ interest in the academic quality of pro- 
grams to help them determine the intellectual strength of applicants. Today, institu- 
tions of higher education are interested in knowing the quality of foreign schools as 
they seek international partnerships with institutions of comparable strength. Na- 
tional rankings in the U.S. also became increasingly common as more and more 
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Americans attended college after high school, fueling the demand from consumers 
for information about quality. Similarly, the increasing number of students studying 
in foreign countries and the increasing competition to attract these students gener- 
ates both a larger demand for and a greater influence of global academic quality 
rankings. 57 Finally, global rankings have also grown in importance as they have pro- 
vided assessment and accountability for institutions of higher education, with some 
foreign governments going so far as to base the amount and direction of money 
spent on higher education on institutions’ placements. 58 

II. Contributions and Criticisms of College Rankings 

While college rankings have been consistently subjected to serious and accurate 
criticisms since the furor raised in reaction against the Bureau of Education’s 1911 
stratification, they nonetheless serve important functions. Academic quality rank- 
ings help to make a concern for excellence in higher education publicly visible and 
active. The spirit of competition produced by rankings encourages universities to 
perform better, and combats the pitfalls of institutional stagnation that could de- 
velop in its absence. 59 Furthermore, rankings meet a widespread demand for pub- 
licly-accessible comparative information about institutions that students will spend 
tens of thousands of dollars to attend. Indeed, multiple projects throughout the his- 
tory of academic quality rankings have mentioned the better informing of students’ 
college decisions as a major driving force behind the creation of their rankings, in- 
cluding Cattell, Prentice and Kunkel, U.S. News, and Forbes/ CCAP. 

James Schmotter, then assistant dean of Cornell’s Johnson Graduate School of 
Management, argued in 1989 that colleges and universities had only themselves to 
blame for the rise of college rankings because higher education had failed to put for- 
ward its own system of evaluating quality that was relevant or intelligible to con- 
sumers. 60 Two decades later, this criticism remains accurate, and college rankings 
continue to be an important starting place for students attempting to sort through 
the thousands of higher education institutions in the United States. In light of these 
important functions that rankings provide, Kevin Carey, a researcher for Education 
Sector, writes that higher education’s anti-rankings sentiment is illegitimate be- 
cause it reflects “an aversion to competition and accountability that ill serves stu- 
dents and the public at large.” 61 

General Criticisms 
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Academic quality rankings do, however, face many legitimate criticisms that should 
always be acknowledged and weighed in their discussion and use. One widely ac- 
cepted criticism is that an ordinal ranking of colleges according to quality can pro- 





Luke Myers and Jonathan Robe 


duce what Gerhard Casper, then president of Stanford University, has called “false 
precision.” Casper argued that the methodology of U.S. News was not precise 
enough to accurately identify the difference in quality between an institution ranked 
#1 and #2 or even #10, even though these separate ranks denote such a difference. 
This criticism can almost certainly be applied to all current and previous rankings. 
Rankings scholar Marguerite Clarke has further claimed that there may even be lit- 
tle difference between schools ranked #15 and # 50, and found evidence that 
changes in an institution’s rank should be viewed more as “noise” than as precise 
changes in quality relative to its peers. To overcome the problem of false precision, 
Clarke advocates listing schools alphabetically in broader “quality bands” in which 
each school belonging to the same band would be considered largely of equal qual- 
ity. 62 

One specific result of a highly-precise, ordinal ranking is that changes in the meth- 
odology of the ranking system can produce changes in an institution’s rank without 
any change in the institution’s quality. Minute adjustments to what criteria are 
used, how these criteria are weighted, and how institutions are classified and cho- 
sen for ranking can result in wide swings or even complete disappearance of a 
school’s rank, even though the characteristics of the institution remain un- 
changed. 63 In fact, one study published in May of 2008 by two mathematicians at 
the University of California, Berkeley, revealed that the U.S. News rankings are 
highly volatile. By adjusting the different criteria weighting and using the same data 
as U.S. News, these researchers concluded that a school’s “specific placement... is 
essentially arbitrary.” While the data which comprise the rankings are useful both 
for institutions and for students, the researchers argued that because weightings 
reflect nothing more than individual preferences, published rankings should leave 
criteria weighting to individual readers. 64 

This research also helps to support two additional common criticisms of rankings. 
First, it is argued that one ranking system based on a certain set of criteria and 
weighting cannot possibly judge the quality of all institutions of higher education in 
a fair and accurate manner. Colin Diver, president of Reed College — which has re- 
fused to cooperate in providing information to U.S. News for over a decade — has ar- 
gued that higher education is too complex of a product and that consumers’ individ- 
ual preferences are too diverse for all institutions to be fairly judged by a singular 
ranking scale. 65 Finally, considering the lack of consensus on the exact definition of 
“academic quality,” every ranking system makes a subjective value judgment about 
which criteria represent “quality” in higher education. The choice of which measures 
to use in a ranking implicitly and somewhat arbitrarily defines the meaning of 
“quality.” 
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The above criticisms are 
not intended to condemn 
college rankings as a 
product. The subjectivity 
and at times arbitrari- 
ness of construction and 
adjustment of method- 
ologies is an argument 
for the expansion of col- 
lege rankings, not their 
dismissal. In light of the 
demand for the informa- 
tion provided by college 
rankings, multiple rank- 
ings with diverse meth- 
odologies should be encouraged. This would allow consumers to evaluate the indi- 
vidually subjective criteria of each ranking system and adhere to the one that most 
closely reflects their own preferences, or to take an average of an institution’s rank 
across many different judgments of its quality. A diversity of popularly accessible 
ranking systems would also help to mitigate the perverse incentives created when 
one ranking system is hegemonic, as discussed below. 

Furthermore, the subjectivity of choosing a definition of quality does not prevent the 
evaluation of different ranking methodologies according to their ability to measure 
quality once defined. Although the choice of methodology may implicitly affect a 
study’s definition of quality, it can be assumed that most compilers of rankings be- 
lieve that quality is at the same time something independent of the criteria used to 
measure it. For example, most persons interpret an institution’s reputation as an 
indication of other criteria related to their definition of quality — such as excellence of 
teaching, strength of academic services, etc. — rather than viewing a good reputation 
as itself the definition of quality. Once quality is subjectively defined for a given 
ranking system, then it is possible to make an objective evaluation of that ranking’s 
ability to accurately measure this quality. 

Reputational Rankings 

With this in mind, it is useful to look at the specific strengths and weaknesses of the 
different approaches to academic quality ranking. Reputational surveys are one of 
the most widely used criteria in current ranking systems and carry some distinct 
benefits. Webster has argued that reputational criteria take advantage of the knowl- 
edge of “those who supposedly know most about academic quality,” be they univer- 
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sity presidents, academic deans, department heads, or employers. Furthermore, 
they produce rankings that have “face validity” in that the results are consistent 
with the public’s intuition. 66 An institution’s reputation, whether among higher edu- 
cation officials, employers, or the general public, has been earned for a reason, after 
all. It seems reasonable to believe that the better an institution’s reputation, the 
higher the institution’s quality, at least at one time. Finally, a good reputation may 
be self-reinforcing. A school with a reputation for high quality may feel extra pres- 
sure to maintain its reputation and, therefore, have greater concern for its academic 
quality. 

However, reputational criteria have serious drawbacks. The first is that there is evi- 
dence that an institution’s reputation comes from sources that may not accurately 
reflect its strength in criteria that are widely accepted as good indicators of academic 
quality. Studies have demonstrated that an institution’s reputation is closely corre- 
lated with its research productivity and its size. Publication and citation records, 
however, probably have little, and at best an indirect, effect on the quality of an in- 
stitution’s teaching and the knowledge of its graduates. 67 

More importantly, even if reputation is derived from those characteristics used to de- 
fine academic quality, it is highly questionable whether respondents are knowledge- 
able enough to judge this quality at the numerous institutions they are asked to 
rate. In the 1982 National Academy of Sciences ranking discussed above, respon- 
dents reported that they were, on average, unfamiliar with one-third of the programs 
they were asked to rate. 68 Patricia McGuire, president of Trinity University, views as 
“preposterous” the idea that college presidents have the ability to rate hundreds of 
institutions on a scale of one to five — as they are asked to do by U.S. News. 69 This 
lack of informed judgment by raters was revealed by the “halo effect” in Solmon and 
Astin’s 1981 ranking of undergraduate departments, where less prestigious depart- 
ments were rated more highly based on the overall reputation of their institution. In 
one extreme, Princeton’s business department was ranked among the top ten, even 
though Princeton did not have an undergraduate business program. 70 

The “halo effect” demonstrates that at least some reputational survey respondents 
rate programs and institutions without the requisite knowledge to make an informed 
judgment of quality. This information problem is of even greater concern now that 
global rankings with significant reputational components have emerged. A recent at- 
tempt by Germany’s Center for Higher Education Development to include Switzer- 
land’s German-speaking universities in its well-regarded ranking system was hob- 
bled by Swiss professors’ considerable lack of knowledge about German schools. 
Considering this disconnect between countries that border each other and speak the 
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same language, it is highly questionable whether reputation can be used as an accu- 
rate ranking criterion on a global scale. 71 

Another serious concern related to raters’ lack of information about the institutions 
they are ranking is that reputational rankings — particularly annual ones — may be 
creating a feedback mechanism for institutional reputations that delays the realiza- 
tion in the ranking of a true change in quality at an institution. Figures 1 and 2 
show that the correlation between an institution’s year-to-year ranking has in- 
creased to and stabilized around one hundred percent in the last decade, meaning 
there is little change in the rankings overall from one year to the next. Furthermore, 
since 1993 the reputational “peer assessment” component of the U.S. News rankings 
has increased in correlation to the overall ranking for both national universities and 
liberal arts colleges. In 2009, these correlations to the overall ranking were the high- 
est of any other component for liberal arts schools and the second highest for na- 
tional universities (see Tables 2 and 3). 

One possible explanation of these phenomena is that raters have become more accu- 
rate in their judging of peer institutions’ true quality, in that they rate institutions 
more in line with their overall rank as determined by the other components, and 
that this quality is rather stable from year to year. In light of other claims made 
against reputational evaluation, however, a more likely explanation may be that uni- 
versity presidents and admissions officials, who are asked to rate institutions about 
which they have little information, are turning to the most readily available source of 
information on academic quality: the previous year’s U.S. News rankings. Brewer, 
Gates, and Goldman draw this conclusion, arguing that U.S. News rankings can cre- 
ate a positive feedback loop in which “institutions that are prestigious today are 
more likely to have a high level of prestige tomorrow.” 72 

Having little familiarity with even their peer institutions, academic officials are likely 
to allow a school’s previous rank to affect their contemporary assessment of its aca- 
demic reputation. Even if only indirectly, the sheer ubiquity of the U.S. News rank- 
ings suggests a high likelihood that previous years’ rankings influence current ad- 
ministrators’ opinions about the schools they are asked to rank. Although the insti- 
tutions’ characteristics of quality are likely to be relatively stable from year to year, 
raters relying on past rankings create a feedback loop in which one year’s ranking 
helps to solidify their position in the next year’s ranking, making changes in an in- 
stitution’s rank respond more slowly when actual changes in its quality do occur. 73 


26 





Luke Myers and Jonathan Robe 


Input- and Outcome-Based Rankings 

Several rankings have responded to criticisms of reputational criteria by including 
objective data about input variables in their calculations. The 1982 NAS study, U.S. 
News, Money, and Times Higher Education all used input variables such as library 
size, student-faculty ratio, incoming students’ test scores, and amounts of educa- 
tional expenditure either in place of or as a supplement to reputational surveys. 
While the strengths of these criteria include their quantitative nature and claim to 
objectivity, such input measures do not necessarily correlate with academic quality. 
Similar to a university’s faculty research output, input measures are at best only in- 
direct measures of education provided at an institution. 74 

The primary alternative to reputational and input criteria is data about outcomes. 
Few, if any, other goods are judged by the strengths of the inputs used in their crea- 
tion rather than the strength of the final output. As noted above, Prentice and 
Kunkel’s 1930 study makes a compelling argument that institutions of higher edu- 
cation can best be judged by their “quality of product.” 75 Indeed, as Webster has 
noted, all colleges seek to prepare their students for success in life after graduation; 
therefore, outcomes-based assessment of schools’ ability to do so seems appropriate. 

Unfortunately, because schools may have different ideas about the definition of suc- 
cess for which they are preparing students, outcomes-based rankings are especially 
vulnerable to the above noted criticism that no single ranking system can fully judge 
the complex product of higher education. A more significant criticism, however, is 
that outcomes-based rankings suffer from a time-lag. However post-graduation suc- 
cess is defined, and even if it is defined in a way that applies to all institutions of 
higher education, it takes time for graduates to become (or fail to become) success- 
ful. For example, it may take graduates twenty to thirty years to be listed in Who’s 
Who in America, a common measurement used throughout the history of academic 
rankings. In that amount of time entire faculties of colleges and universities may 
have changed, meaning that the ranking based upon such time-lagged outcome 
components may have little relation to the current quality of an institution. 

One final criticism of outcomes-based criteria is that they may not significantly differ 
from some input-based components. The research of Alexander Astin, for example, 
has demonstrated that a school’s production of eminent alumni is largely dependent 
on the students’ abilities when entering college. 76 Although value-added measures 
that compare outcomes to inputs would mitigate this problem, no major ranking has 
been compiled solely or even primarily based on value-added criteria, probably be- 
cause value-added data are rarely collected on a systematic basis. 
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III. Effects of College Rankings 

These criticisms of academic quality rankings need to be carefully considered in 
light of the dramatic increase in their popular consumption over the past twenty-five 
years. Before 1983, academic quality rankings were largely compiled and used by 
and for institutions of higher education. These esoteric studies had begun to receive 
increased attention in the press in the second half of the twentieth century, but it 
was not until U.S. News and World Report began ranking schools annually in the 
late 1980s that an academic quality ranking was both easily understandable and 
circulated to millions of readers. With such exposure, the U.S. News rankings be- 
came the publicly accepted authority on academic quality, and today a school’s 
ranking may have significant effects on the students it attracts, its admissions proc- 
ess, and the price of its tuition. While increased publicity carries many benefits in 
the form of better information for consumers, the results of these effects are argua- 
bly providing perverse incentives for institutions of higher education. 

Impact of U.S. News on Colleges 

A school’s annual rank in U.S. News almost certainly has a relation to the caliber of 
students it is able to attract. A study by Patricia McDonough et al. found that when 
making decisions about matriculation, 59.9 percent of college freshmen listed col- 
lege rankings in newsmagazines as having been “not important at all” in their ma- 
triculation decision, 29.6 percent found them “somewhat important,” and only 10.5 
percent found them “very important.” 77 While this report has been used to downplay 
the effect of college rankings on matriculation decisions, such an interpretation 
misses the importance in the breakdown of the numbers. Indeed, the authors found 
that high-achieving students — those with A grades in high school and favorable as- 
sessment of their own academic ability and motivation — were highly likely to find 
rankings “very important.” Additionally, Anne Machung has reported that parents of 
high-achieving high school students pay more attention to rankings than do their 
children, with two-thirds of them identifying U.S. News as “very helpful” in evaluat- 
ing a school’s quality. 78 While the majority of students may not consider a school’s 
placement in a popular ranking, 79 those with the greatest academic ability are di- 
rectly or indirectly (through their parents) influenced by rankings. This latter group 
is, of course, the kind of student that every college is most interested in recruiting. 

Empirical evidence about the relationship between an institution’s U.S. News rank 
and its incoming freshmen class further reveals the influence of rankings on high 
school students’ college decision. Two studies have found significant effects on col- 
leges’ lagged admissions outcomes after a change in rank in U.S. News. The first 
study done by Monks and Ehrenberg looked only at selective private institutions and 
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found that falling one spot in the rankings results in a predicted increase of the 
school’s rate of admittance of applicants by almost half a percentage point and a de- 
crease in the yield rate of those students accepted. Therefore, a less prestigious rank 
requires a school to become less selective and admit a greater percentage of its ap- 
plicant pool in order to fill its classrooms. The end result is the lowering of average 
SAT scores among incoming freshman. 80 Marc Meredith’s study confirmed and built 
on Monks and Ehrenberg’s results by looking at 233 doctoral research universities, 
both public and private. Meredith found that a movement from the second to the 
first quartile increased the number of students from the top ten percent of their high 
school class by 1.5 percent, a 4 percent decrease in the school’s admittance rate 
and, for public schools, an increase in average SAT scores by almost twenty 
points. 81 

Monks and Ehrenberg also explored the effects of U.S. News rank on a school’s pric- 
ing decisions. They found that while a change in rank did not affect the (all private) 
schools’ “sticker price,” less visible discounts were associated with a decrease in 
rank. A less favorable rank led to lower thresholds of expected self-help and more 
generous financial aid grants, with a drop of ten places, resulting in a 4 percent re- 
duction in aid-adjusted tuition and an overall decrease in net tuition. 82 Indeed, this 
ability of private schools to adjust their pricing in reaction to changes in rank was 
hypothesized by Meredith as explaining why the drop in SAT scores was only statis- 
tically significant for public schools. 83 

Perverse Incentives 


Under the safe assumption that most, if not all, colleges and universities prefer to 
admit higher-achieving over lower-achieving students, the above effects of the U.S. 
News ranking provide strong incentives for schools to spend resources to better 
their position. Unfortunately, many of the components used to calculate a school’s 
rank in U. S. News have a dubious connection with actual academic quality. Kevin 
Carey argues that 95 percent of the ranking is based directly or indirectly on institu- 
tions’ “wealth, fame and exclusivity,” and as a result they “focus an inordinate atten- 
tion on fundraising, marketing, and attracting faculty with outsized scholarly repu- 
tations — at the expense of the core missions of access and undergraduate learn- 
ing.” 84 

While the direct impact may not be so radical as to result in higher education’s 
“management by U.S. News and World Report f that Don Hossler of Indiana Univer- 
sity decries, some of the factors in the U.S. News ranking surely must contribute to 
the inefficient use of a university’s resources. For example, Hossler has provided an- 
ecdotal evidence of how the use of a school’s acceptance rate in the rankings has in- 
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fluenced schools’ admissions processes. One institution, he wrote, developed a two- 
part application of which the first part does not require an application fee or matter 
much in the final application decision, but it still counts as an application. Admis- 
sion officials at the institution acknowledged that the two-tiered system was put in 
place in order to lower their acceptance rate and increase their overall ranking. 
Hossler tells of another university that now provides less information about admis- 
sions standards, hoping that more students will apply and be rejected for the same 
reason . 85 While these practices probably have no impact on academic quality one 
way or the other, they do represent an expenditure of real resources (admissions 
staffs time, student applicant’s time, postage, etc.) that could be saved — or put to 
better use — in the absence of the acceptance rate criterion. 

Other components of the U.S. News ranking that encourage inefficiency are the 
measurement of expenditures on faculty resources, expenditures for research, and 
the peer assessment score. Thirty-five percent of the faculty resources component is 
determined by the average salaries of full and tenured professors. Yet, as Hossler 
points out, a school that increases faculty salaries or recalculates how it reports 
their benefits to U.S. News will not necessarily improve the quality of instruction, 
but it will improve the school’s rank . 86 The U.S. News ranking also provides incen- 
tives for overinvestment in research. Spending more money on research directly re- 
wards schools through the financial resources component, and most likely indirectly 
through the peer assessment component, since previous studies have demonstrated 
a link between reputation and research productivity. As a result, colleges and uni- 
versities interested in improving their rank would have reason to push their profes- 
sors to prioritize seeking grants for and spending time on expensive research pro- 
jects over improving their quality of teaching, since the latter does not directly affect 
an institution’s rank in U.S. News. 

The use of any expenditure variable in a ranking has adverse effects on efficiency. As 
Ehrenberg points out, judging a school by its expenditures per students actually 
provides disincentives for cutting costs and keeping tuition down. In the U.S. News 
ranking, if two colleges provide the same academic quality but one does it while 
spending less, all other factors being equal this school would actually receive a lower 
ranking than the school that provided the same quality at greater cost it its students 
(and to taxpayers, if the school is public). In light of this, Ehrenberg notes that “no 
administrator in his or her right mind would take actions to cut costs unless he or 
she had to .” 87 

The concern over these perverse incentives has led some to argue that college rank- 
ings help to fuel an “academic arms race” in which colleges insatiably consume re- 
sources in hopes of increasing their prestige. However, rankings can be designed in 
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a way that mitigates this escalation. Regres- 
sion analysis demonstrates that while there 
is a positive relation between an institution’s 
instructional spending and its ranking score 
in U.S. News, this relationship is negative in 
the Forbes/CCAP ranking (see Table 4). 88 Re- 
stricting the analysis to those 107 schools 
included in the revealed preference ranking 
study done by Avery et al. (see below for dis- 
cussion) allows us to evaluate the relation- 
ship between spending and rank for the most 
academically elite schools, which are argua- 
bly the most keenly affected by the 
“academic arms race.” The results reveal 
that both instructional and research spend- 
ing are highly significant and positively cor- 
related to schools’ scores in the U.S. News 
ranking while they are highly significant and 
negatively correlated to schools’ scores in the 
Forbes/CCAP ranking (see Tables 5 and 6). 
Neither variable is significant to a school’s 
rank in the Avery study. 


Fig. 1: Correlation Between USNWR Ranks 
with Previous Year's Rank (National 
Universities) 
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Fig. 2: Correlations Between USNWR Rank 
with Previous Year's Rank (Liberal Arts 
College) 
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These regressions demonstrate that while it 

is possible to “buy” your way up the ranks in U.S. News, rankings can be designed 
to discourage spending (Forbes/ CCAP) or to be spending neutral (Avery et al.). These 
latter two ranking systems do not provide the perverse incentives for inefficient 
spending that exist in the U.S. News rankings, and instead, provide incentives for 
institutions of higher education to rein in their spending habits, thereby helping to 
disarm the “academic arms race” that has little relation to the improvement of aca- 
demic quality. 


IV. College Rankings Reform 


In light of the criticisms and adverse effects of current ranking systems — especially 
those of the hegemonic U.S. News — the case for reform should be easily accepted. 
However, only reform (and not abandonment) of college rankings will result in the 
better use of resources in higher education. Kevin Carey argues that the rise of 
global rankings should completely put to rest the idea that institutions can boycott 
their way back to the pre-ranking days. 89 The history detailed above demonstrates 
that rankings have been entrenched as a part of American culture long before the 
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Table 2: Correlations of Component Ranks to Overall Rank in 

U.S. News (National Universities) 


1993 

1998 

2009 

Alumni Giving Rank 

N/A 

0.55 

0.52 

Faculty Resources Rank 

0.78 

0.67 

0.68 

Financial Resources Rank 

0.62 

0.66 

0.77 

Graduation and Retention Rank 

0.55* 

0.80 

0.89 

Reputation Rank 

0.64 

0.72 

0.85 

Selectivity Rank 

0.77 

0.81 

0.73 

Table 3: Correlations of Component Ranks to Overall Rank in 

U.S. News (Liberal Arts Colleges) 


1993 

1998 

2009 

Alumni Giving Rank 

N/A 

0.47 

0.68 

Faculty Resources Rank 

0.64 

0.48 

0.65 

Financial Resources Rank 

0.75** 

0.65 

0.79 

Graduation and Retention Rank 

0.53* 

0.72 

0.81 

Reputation Rank 

0.86 

0.91 

0.91 

Selectivity Rank 

0.67 

0.77 

0.85 


first international ranking, and unequivocally supports Carey’s claim that “so long 
as companies can publish magazines and students can choose colleges, someone 
will create college rankings that people will read and care about .” 90 


If higher education does not want to be judged by U.S. News or any other news- 
magazine that enters the ranking business, its only alternative is to help provide 
something better. As Carey writes, higher education “can’t choose whether to have 
rankings or not, only whether they’ll be good or bad .” 91 Hossler argued that data fo- 
cusing on what students do after they enroll — including outcomes, surveys, and as- 
sessments of currently enrolled students and alumni — are more reliable indicators of 
a school’s quality than those used in U.S. News . 92 In the past, there has been a lack 
of better and easily usable data than that collected by U. S. News, but recent devel- 
opments in academic quality assessment of higher education have helped to fill this 
void . 93 Below are three indicators of academic quality along the lines of Hossler’s 
prescription whose use could greatly improve college rankings and provide better in- 
centives to institutions of higher education, along with suggestions on how best to 
compile such indicators into overall rankings. 

The National Survey of Student Engagement 
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The National Survey of Student Engagement (NSSE, pronounced “Nessie”) seeks to 
determine how successful colleges are at promoting those experiences that lead di- 
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rectly to student learning. The survey is administered to a representative sample of 
freshmen and seniors at each participating school. Categories surveyed include aca- 
demic challenge, active and collaborative learning, student-faculty interaction, en- 
riching education experiences, and supportive campus environment. Some of the 
specific measures within these categories include the number of books assigned and 
lengthy papers written, the time spent on academic and other pursuits, the synthe- 
sizing of complex ideas, how often students work with other students, how often 
they interact with faculty outside of class, and the availability of experiences such as 
study abroad and culminating senior experiences. All of the measures were chosen 
in light of extensive research that links them to high-quality undergraduate out- 
comes . 94 

NSSE data show that at least some of these direct indicators of academic quality are 
not being captured by the U.S. News rankings. Carey reported that while some ele- 
ments of the survey correlated to the magazine’s rankings, a school’s reputational 
peer review of “academic reputation” — the largest single component in U.S. News — 
was not correlated with its promotion of active learning, student-faculty interaction 
or a supportive campus environment, as measured by NSSE . 95 Additionally, a study 
by Gary Pike, after controlling for student characteristics at fourteen public research 
universities, found no statistically significant relation between U.S. News measures 
and NSSE benchmarks, except for one. Pike concluded that “the quality of a stu- 
dent’s education.... seems to have little to do with resources and reputation.” 96 It is 
not surprising, then, that two colleges that make their NSSE data public — Miles Col- 
lege and Jackson State University — are in the third and fourth tiers of U.S. News re- 
spectively, yet they score above the national average (sometimes dramatically so) in 
many of the NSSE measures . 97 Although hundreds of schools participate in the sur- 
vey nationally, the most prestigious private schools largely do not. More widespread 
use of NSSE and a transparent reporting of the results could help improve rankings 
qualitatively or, in some cases, perhaps even substitute for rankings. 

The Collegiate Learning Assessment and Learning Outcomes 

The Collegiate Learning Assessment (CLA) is a test developed by the Council for Aid 
to Education (CAE) and is administered to participating schools’ freshmen in the fall 
and seniors in the spring. Rather than testing knowledge in a multiple-choice for- 
mat, CLA focuses on measuring students’ abilities for critical thinking, analytic rea- 
soning, written communication and problem solving. Tasks include interpreting, 
analyzing, and synthesizing information; articulating complex ideas; examining 
claims and evidence; and supporting ideas with relevant discussion. CAE states as a 
goal the simulation of complex situations that every college graduate can be ex- 
pected to face some day. CLA is designed to measure the reasoning and communica- 
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tion skills that are largely regarded as an important outcome of a successful college 
education. 98 

In addition to CLA’s direct measuring of an institution’s absolute output of students 
with these important skills, another major advantage of CLA is its ability to also 
measure the value-added element of a college. With the test administered to both 
freshmen and seniors, it is possible to get a picture of how the education at an insti- 
tution is improving the students over four years. This helps mitigate the shortfall of 
other outcomes-based indicators that make it difficult to know if good outcomes are 
more related to the students coming into the school or to the school’s practices. Fur- 
thermore, using statistical analysis, it is possible to develop a model that predicts 
the expected CLA scores based on the incoming students’ SAT scores. This model 
can then reveal those schools that outperform and underperform on CLA scores 
compared to the SAT scores of their incoming students. 99 These value-added meas- 
urements made possible through the CLA would be much more helpful in indicating 
a school’s academic quality than are criteria such as incoming freshmen’s SAT 
scores. While the latter reveals the scholastic ability of students a year before enter- 
ing college (and, therefore, probably tells us more about their prior education), CLA 
potentially reveals how much students have actually improved while at college. 100 

Once again, there is evidence that the indirect measures used by U. S. News fail to 
capture the actual academic output of the colleges and universities it ranks. Accord- 
ing to Carey, in the University of Texas system, the schools whose CLA scores most 
outperform the predicted results based on their incoming students’ SAT scores are 
those that are ranked lowest by U.S. News. The highest ranked school, UT-Austin, 
falls below expected performance. 101 Again, CLA suffers in that it is used by only a 
small (but growing) minority of schools. 102 

It would also be possible to use knowledge-based tests of actual learning to measure 
collegiate performance. Similar to the CLA, one could test students at the beginning 
and end of their education in order to calculate the intervening change in knowledge 
to obtain a different measure of value added. Although there is one such effort that 
ostensibly measures “civic literacy” based on a test of knowledge of facts and princi- 
ples of history, political institutions, and economics, its use has been confined to a 
sample of fewer than one hundred schools. 


Post-Graduate Career Success 

While post-graduate success of alumni has been used as an indicator of a college’s 
quality since the earliest rankings, and as recently as the Forhes/CCAP 2008 rank- 
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ing, two recent developments in gathering improved data deserve mention as possi- 
ble future reforms of college rankings. A few states have begun to link employment 
data about wage earnings with education data from public schools. Florida has the 
most advanced information system in which each student has a unique identifica- 
tion number that travels with him or her from K-12 and into higher education. 

When these records were matched up with earnings data from the state’s unemploy- 
ment insurance system, there were some surprising results. The four public univer- 
sities in Florida that produced the highest earners were among the six worst Florida 
schools ranked by U.S. News. Although this may be partially explained by the top 
earners from the higher ranked schools leaving the state or going on to grad school, 
Carey argues that the discrepancy is too large to be completely covered by such ex- 
planations, 103 and through the use of migration data, this problem could be con- 
trolled. 

Arguably the best way to judge post-graduate career success is to use employers as 
the rankers — or at least as the providers of data. In 2008, Boeing became the first 
major company to be involved in a college ranking. Using data from internal em- 
ployee evaluations of its engineers, matched to data about their alma maters, Boeing 
created a ranking system to determine those colleges that have produced the most 
valuable workers in accordance with the company’s goals and standards. The pro- 
ject found that the review of employee evaluations demonstrated significant differ- 
ences in the quality of graduates from different schools, and the company has said 
that it will use the data to guide its decisions on hiring from and working with uni- 
versities. While Boeing kept the results of the ranking confidential and shared them 
only with the colleges and universities themselves, the project could serve as a use- 
ful model for expanded employer-based rankings. 104 

Ranking colleges using the evaluations of their graduates’ employers involves signifi- 
cant advantages over current ranking systems. First, employers are in the best posi- 
tion to judge whether a graduate has been properly prepared for his or her post- 
graduate career, information that is vital to students who are about to make a major 
investment in their education. Second, expanded employer rankings that encompass 
many companies across a number of industries allows for the input of numerous 
definitions of quality. Previous rankings have often suffered from a limited definition 
of quality, but different employers look for different characteristics in their employ- 
ers, the aggregation of which provides a picture of what schools generally best pre- 
pare their students. Finally, and related to this last point, employee evaluation- 
based rankings incorporate outputs that are otherwise very difficult to quantify. 
Other forms of outcome-based rankings, such as those measuring entries in Who’s 
Who or graduates’ salaries, are often criticized for having a limited definition of 
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“success.” However, employee evaluations often combine judgments of technical 
skills with interpersonal skills and intangible qualities that are lost in most ranking 
mechanisms, yet which some schools may better develop than others. 

Employer-based rankings do have their own drawbacks. First, they suffer from the 
same problem of all other outcomes-based rankings in that it is difficult to deter- 
mine whether a highly ranked school produces good graduates because of its inher- 
ent quality or because it attracted better students to begin with because of its high 
rank. Second, small colleges might not be able to produce enough graduates to gen- 
erate results in employer rankings that would be statistically significant. The final 
drawback might make a national employer-based ranking system unfeasible: a pro- 
ject like Boeing’s carries a hefty price tag. Former acting assistant secretary for post- 
secondary education Cheryl Oldham acknowledged that most businesses could not 
spend the amount of time and money it took Boeing to complete their ranking . 105 It 
might be possible to mitigate this cost, though, as some private entrepreneurs have 
developed Web sites with some postgraduate performance data by school, notably 
www.carreers.com. 

Revealed Preference Rankings 


Another potential problem with data published in both the U.S. News rankings and 
college guidebooks is its inherent susceptibility to institutional manipulation. Par- 
ticular institutions can attempt to influence their rank by carefully adjusting several 
of the factors (especially admissions trends); there have been several well- 
documented cases of this phenomenon over the years . 106 A study by Avery et al. 
sought to correct this problem by generating a “revealed preference” ranking of 
schools. Relying on more than just raw admissions and matriculation rates, this 
study ranked schools based upon student desirability. That is, a school’s rank was 
based upon students’ preference for that school relative to other schools to which 
they had been admitted. The study looked at the enrollment choices for high per- 
forming students by modeling the students’ decisions as tournaments between com- 
peting schools. The sample of students included in the survey was not fully repre- 
sentative of all college applicants, and, therefore, the ranking was restricted to aca- 
demically elite institutions in the United States . 107 

Although this ranking should not be interpreted as a direct measure of institutional 
academic quality, the authors argued that their methodology for the rankings has 
merit because college students (particularly the best students) are concerned about 
their peers’ college choices. College students are interested in attending schools that 
attract the best students, because the institutional reputation derived from enrolling 
many good students will likely reflect favorably upon all graduates of these institu- 
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tions. While the authors of this rankings study did not consider their work to be a 
completely new and independent college ranking, they argued that the “revealed 
preference” of students could be an additional component used to reform current 
ranking systems . 108 

Do-It-Yourself Rankings 

Another recent development in college rankings is the emergence of “do-it- 
yourself’ (DIY) rankings. Rather than “pre-packaged” rankings where researchers 
collect data and then decide on each factor’s weight in the overall ranking, DIY rank- 
ings capitalize on web technology to allow prospective students to assign their own 
weight to each factor so as to produce a ranking that best reflects their individual 
preferences. For example, after choosing a subject, the Center for Higher Education 
Development (DAAD) in Germany allows users to select five criteria in order of im- 
portance from an overall list of twenty-five. The resulting table lists the German uni- 
versities in order of their performance on these five criteria, as well as showing the 
relative position of each school on each criterion . 109 In addition to their own pre- 
packaged annual ranking, the Canadian magazine Macleans allows students to 
choose seven indicators (from a total of thirteen) and then assign their own numeri- 
cal weight to each. The most complex DIY ranking is produced by PhDs.org, a pro- 
ject of former Dartmouth College professor Geoff Davis. Using data from the National 
Science Foundation, the National Research Council, and the National Center for 
Education Statistics, the website allows users to choose a subject area and then rate 
the importance of twenty-eight different criteria from zero to five. The detailed result- 
ing table displays the overall rank according to the user’s weighting, each depart- 
ment’s data in the categories, and the relation of each department’s data to the 
mean . 110 

Do-it-yourself rankings provide better information to consumers than typical pre- 
packaged rankings. Two earlier mentioned weaknesses of traditional rankings are 
that they represent only the subjective judgments of their producers and that no one 
ranking system can adequately reflect what all consumers are looking for in a col- 
lege. DIY rankings overcome these drawbacks by allowing consumers to receive in- 
formation that is easily understood and accessible, but that also reflects their own 
preferences and definition of quality . 111 The further development of DIY rankings is 
not a replacement for the use of better indicators of quality. Even where users can 
choose their own components and weights, any ranking is only as good as the data 
collected, and it is important that this data, whether incorporated into pre-packaged 
or DIY rankings, more directly and accurately reflects the quality of academic prac- 
tices and outputs in higher education. 
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V. Conclusion 

Many professors and administrators 
alike wistfully dream of returning to 
the “golden days” before college 
rankings allegedly ruined higher 
education. Such pre-ranking days, 
however, are now a century past. 

The continued development of new 
domestic and international ranking 
systems makes the chance of their 
demise more and more remote. The 
call for the abolition of ranking is 
not only futile, given the course of 
their history; it is also illegitimate in 
light of academic quality rankings’ 
important contributions. Regardless 
of their flaws — and they have 
many — college rankings satisfy an 
immense demand for information 
from students, parents, and the 
general public about institutions 
that traditionally lack transparency 
about their internal workings and 
quality of product. A call for the 
abolition of college rankings repre- 
sents a disregard for accountability 
that should be strongly rejected in 
light of the large investments that 
consumers and the public make in 
higher education. 

Like the institutions they rank, college rankings also need to be held accountable. 
Improvement in the design and execution of rankings requires an increased focus on 
accurately measuring widely agreed-upon criteria of academic quality and providing 
incentives for the efficient use of resources in the pursuit of this quality. Below are 
four guidelines for academic quality ranking systems that, if followed, will increase 
their usefulness and reliability. The International Ranking Expert Group (IREG), 
founded through a partnership between the UNESCO European Center for Higher 
Education and the Institute for Higher Education Policy, has already produced the 


Table 4: Dependent Variable is the Ranking Score, 
Ordinary Least Squares Estimation 


Forbes/ 

CCAP 

Score 

U.S News 
Score 

Size (FTE Enrollment 
dummy) 

-5.4064**** 

(1.2618) 

-4.1603* 

(1.7889) 

Geographic Location (NE 
US = 1) 

-2.7316* 

(1.0666) 

-2.5296 

(1.4377) 

Endowment size per FTE 
Undergraduate Student 

0.0000**** 

(0.0000) 

0.0000**** 

(0.0000) 

Instructional Expenditures 
per Undergraduate 

-0.0095 

(0.0261) 

0.0709* 

(0.0273) 

Percentage Receiving Fi- 
nancial Aid 

20.8681**** 

(2.7577) 

-38.1433**** 

(4.0434) 

Enrollment Rate 

7.7364* 

(3.7751) 

9.5231 

(5.9371) 

Tuition 

0.0003**** 

(0.0001) 

0.0005**** 

(0.0001) 

Constant 

49.9306 

72.1082 

R 2 

0.4387 

0.6667 

N 

568 

256 

Note: Main entries are unstandardized regression coeffi- 
cients. Numbers in parentheses are standard errors. 
*p<0.05, **p<0.01, ***p<0.001, ****p<0.0001 
Sources: Authors’ calculations. 


38 






Luke Myers and Jonathan Robe 


“Berlin Principles on Ranking 
of Higher Education of Institu- 
tions” — all of which the authors 
endorse. However, the following 
list highlights and consolidates 
those principles of IREG that 
are most relevant to the work 
done in this report and pro- 
vides some expansion. 

Define academic quality. 

The preferences of potential 
college freshmen and the char- 
acteristics of higher education 
institutions are too diverse and 
complex for any one system of 
ranking to be equally useful to 
all. Each ranking makes a sub- 
jective judgment on what to 
identify as “quality.” No one 
ranking system can legitimately 
claim to judge “America’s Best 
Colleges”; it can only claim to 
judge those colleges that are 
best at the criteria it chooses to 
measure . 112 For college rank- 
ings to adequately serve their 
most important function — 
providing information to con- 
sumers about the quality of 
product in which they are in- 
vesting — readers must be able 
to clearly understand a rank- 
ing’s definition of “best.” 

Whether a ranking seeks to 
judge schools by the achievement level of students they attract, the educational ex- 
periences on their campuses, the strength of their professors’ teaching, the success 
of their graduates, or some combination thereof, the ranking should make explicit 
what characteristics its compliers desire in colleges and universities. Only then can 


Table 5: Dependent Variable is the Ranking Score, Ordi- 
nary Least Squares Estimation 


Forbes/ CCAP 

U.S. News 

“Revealed 

Preference” 

Percent Fe- 
male 

19.7211** 

(6.1873) 

-7.6916 

(9.0823) 

-2.2496** 

(0.7273) 

Percentage 
Receiving Fi- 
nancial Aid 

-25.6723**** 

(5.9030) 

-14.6678 

(8.9204) 

2.3670** 

(0.7161) 

Instructional 
Expenditures 
per Under- 
graduate 

-0.0469** 

(0.0172) 

0.0669** 

(0.0229) 

0.0031 

( 0 . 0021 ) 

“Revealed 

Preference” 

Score 

3.5382**** 

(0.7685) 

3.3401** 

(1.0680) 

N/A 

Forbes/ CCAP 
Score 

N/A 

0.4237** 

(0.1274) 

0.0486**** 

(0.0106) 

FTE Under- 
graduate En- 
rollment 

-0.0003*** 

( 0 . 0001 ) 

-0.0006**** 

( 0 . 0001 ) 

0.0000 

( 0 . 0000 ) 

USNWR Peer 

Assessment 

Score 

10.9294**** 

(2.5179) 

N/A 

2 3954 **** 
(0.2901) 

Constant 

9.6460 

49.0529 

-4.2575 

R 2 

0.7608 

0.6767 

0.6931 

N 

109 

109 

109 

Note: Main entries are unstandardized regression coefficients. 
Numbers in parentheses are standard errors. *p<0.05, 

**p< 0 . 01 , ***p< 0 . 001 , ****p< 0.0001 
Sources: Authors’ calculations. 
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consumers compare their personal 
preferences with those of competing 
ranking systems and decide which 
one is the best match. 

Use data that directly measures 
this definition of quality. 

Once quality has been defined, 
rankings should measure only 
those criteria that directly (or best) 
reflect this quality. For example, if a 
ranking’s definition of academic 
quality includes the rigor of aca- 
demic challenge or student-faculty 
interaction, the benchmarks and 
survey techniques of NSSE better 
reflect the actual level of this qual- 
ity than do the components of aver- 
age faculty salary or student-faculty 
ratio used in U.S. News. If the 
amount of learning that took place 
over four years is important to the 
definition of quality, then CLA 
much more directly measures the 
educational value added to an insti- 
tution’s graduates than do the SAT 
scores of incoming freshmen. 

Regardless of what criteria are cho- 
sen, only those that can be related 
to the ranking’s definition of quality 
should be considered valid. Any criterion that does not have a direct connection to 
the definition of quality simply adds noise to the ranking and obfuscates the true 
quality of the institutions judged. The relation between measure and quality should 
be drawn out and justified for each criterion by the rankers, thereby creating the 
transparency necessary for consumers to judge the validity and appropriateness of 
each ranking. 


Table 6: Dependent Variable is the Ranking Score, 
Ordinary Least Squares Estimation 


Forbes/ 

CCAP 

U.S. News 

“Revealed 

Preference” 

Percent Fe- 
male 

17.7724** 

(6.2870) 

-4.8029 

(9.1055) 

-2.2023** 

(0.7392) 

Percentage 
Receiving 
Financial Aid 

-23.7215*** 

(6.0570) 

-17.0031 

(8.9019) 

2.3392** 

(0.7310) 

Research 
Expendi- 
tures per 
Undergradu- 
ate 

-0.0616** 

(0.0228) 

0.0867** 

(0.0284) 

0.0024 

(0.0028) 

“Revealed 

Preference” 

Score 

3.3998**** 

(0.7683) 

3.3159** 

(1.0614) 

N/A 

Forbes/ 
CCAP Score 

N/A 

0.4049** 

(0.1263) 

q Q474**** 
(0.0107) 

FTE Under- 
graduate En- 
rollment 

-0.0004*** 

(0.0001) 

-0.0006**** 

(0.0001) 

0.0000 

(0.0000) 

USNWR Peer 

Assessment 

Score 

11.8895**** 

(2.6617) 

N/A 

1.4615**** 

(0.3116) 

Constant 

5.9883 

50.7491 

-4.3951 

R2 

0.7605 

0.6789 

0.6885 

N 

109 

109 

109 

Note: Main entries are unstandardized regression coeffi- 
cients. Numbers in parentheses are standard errors. 
*p<0.05, **p<0.01, ***p<0.001, ****p<0.0001 
Sources: Authors’ Calculations 
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Use data that can only be affected by actual changes in quality. 

The result of only using criteria directly related to a ranking’s definition of quality is 
that only a true change in these criteria can produce a change in an institution’s 
rank. For example, in rankings comprised of data from NSSE, CLA, or post-graduate 
success, the only way for a school to increase its position would be to actually im- 
prove its educational practices (NSSE), increase the ability of its students to think 
critically (CLA), or provide the skills useful in careers. In the U.S. News rankings — 
comprised mostly of input-variables — increased spending, gimmicks such as those 
documented by Hossler, and changes in reporting methods can all improve a 
school’s rank even though they have questionable impact on commonplace defini- 
tions of academic quality. Many studies have come to a conclusion similar to Gary 
Pike’s, when he wrote that academic leaders’ “efforts to garner additional resources 
and enhance institutional reputation has little effect on the quality of their students’ 
education .” 113 


A ranking based on data that can be “gamed” to improve a school’s rank with little 
to no impact on the characteristics central to that ranking’s definition of quality 
should not be viewed as a valid measuring tool. For this reason, using data on out- 
comes is usually preferable to measures of inputs. To the extent that higher educa- 
tion’s ability to improve its students is a part of one’s definition of academic quality, 
only the outcomes of its graduates can demonstrate an institution’s relative 
strength. There is much less opportunity to manipulate outcomes-based data: stu- 
dents either perform well on the chosen measures of success or they do not. The de- 
sire for a higher rank can only be achieved by institutions improving the results of 
their students’ education, according to the ranking’s definition of success. A higher 
rank arrived at by spending more resources in an inputs-based ranking does not 
carry the same guarantee of improvement in results. 

Limit perverse incentives to the extent possible. 

Finally, if the criteria used in college rankings can only be affected by actual 
changes in quality, rankings will produce better incentives for colleges and universi- 
ties to spend their resources efficiently. When an institution can only improve its 
rank by improving its education practices, its students’ learning, or the prepared- 
ness of its graduates, it will spend more resources on these priorities. When inputs 
are the primary basis of a school’s rank — especially expenditure variables — it will 
forgo these priorities and instead focus on spending money and pursuing the gim- 
micks mentioned above. These criteria directly encourage profligate and inefficient 
use of resources by rewarding schools for spending more of others’ money, regard- 
less of whether these expenditures actually contribute to academic quality. 
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Better criteria and design used in college rankings can help to limit the expansion of 
the “academic arms race” fueled by these perverse incentives. As has been shown, 
rankings’ methodologies can be designed to reward schools for limiting their spend- 
ing, while criteria such as NSSE, CLA, and post-graduate success can help to en- 
sure that when resources are spent, they are put towards those practices that best 
promote educational quality. A ranking should reward — or at least not punish — a 
school that provides the same level of academic quality as another school but, all 
else being equal, does so at a lower cost to its students and the public. 

As demonstrated above, improved rankings based on data such as NSSE, CLA, and 
post-graduate success could keep the concern for quality focused on questions of 
educational practices, student learning, and the preparedness of an institution’s 
graduates, rather than on questions of prestige, money, and exclusivity. Such rank- 
ings could promote efficiency and defuse the current academic arms race by provid- 
ing colleges with incentives to devote their resources toward more education-related 
goals. This improved data could prevent institutions from increasing their rank 
purely through gimmicks, profligate spending on high-profile professors and lavish 
student services, or policies with little relation to students’ education. 

However, most of the current failures of rankings published in popular news maga- 
zines to live up to these principles do not originate from willful deception or lazy 
methodology but from the use of second-best data. Even though NSSE and CLA are 
now demonstrated successes, with hundreds of schools administrating one or both, 
most schools only agree to participate under the condition that the results are not 
made public, 114 and the most prestigious schools do not participate at all. In fact, 
U.S. News does report the NSSE results for those schools that provide the data pub- 
licly, but it is far too few schools to develop a useful ranking. USA Today has also 
begun publishing NSSE summary data for public schools (presumably acquired 
through open-records laws) and those private schools that provide the data, but the 
limited number of schools and small amount of detailed data hamper the usefulness 
of this information. Similarly, few institutions keep complete records on their stu- 
dents’ post-graduate success, and those who do are reluctant to release it. While 
there are ideas about how to collect this data through employers, they have yet to 
bear fruit on the necessary scale. 

To a large extent, then, higher education has only itself to blame for the detrimental 
impacts of today’s ranking systems. As Schmotter argued in 1989 at the advent of 
the U.S. News rankings, and as Carey has continued to stress today, the only way 
for colleges and universities to improve the means of their own assessment is to pro- 
vide something better. Carey argues that the higher education establishment must 
“be far more transparent and forthcoming about its successes and failures than it 
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has historically chosen to be” if it wants a better alternative to the rankings of U.S. 
News and other popular publications . 115 Releasing NSSE, CLA, and post-graduate 
success data would be a good first step toward more transparency and would allow 
the producers of academic quality rankings to better serve both the consumers of 
and investors in colleges and universities. Higher education cannot make college 
rankings go away, but increased transparency and cooperation could lead to rank- 
ings that better promote true academic quality than any that have come before. Un- 
til then, it will continue to be the case that “college and university leaders who strive 
for high reputation- and resource-based rankings may be shortchanging their stu- 
dents by focusing their efforts on institutional characteristics that are largely irrele- 
vant to a high-quality education .” 116 

The rise of do-it-yourself rankings could also have potentially revolutionary effects 
by offering tailor-made assessments of colleges that fit the tastes and preferences of 
individual consumers. The commercial aspirations of the publishers of rankings 
seemingly require a single set of rankings, determined by someone’s judgment of 
what is important in evaluating university quality. But in addition to the standard 
one-size-fits-all evaluations, it is easy for us to see how providers increasingly may 
offer readers the option of custom or personalized rankings. This is particularly true 
if there are improvements in data sets and further advances in computer technology 
over time, both likely possibilities. The expansion of customizable rankings, in addi- 
tion to the improvement of pre-packaged rankings, will ensure that academic quality 
rankings better fulfill their most important social benefit, namely, providing useful 
information to the consumers of higher education. 
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